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Over the next several years it is likely that you'll see a subtle but important change in 
the nature of standardized tests that are administered as part of your state and district 
testing programs. This change results from a desire to improve both the norm- and 
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criterion-referenced interpretations of student, school, district, and state testing data. 
These interpretations can be improved by customizing the traditional norm-referenced 
test. 

Norm-referenced tests are designed to give you both normative and objective 
information. Normative information may take the form of scale scores, percentile ranks, 
grade equivalents, normal curve equivalents, and stanines. Objective performance is 
usually reported as a percentage master score based on the objectives included on the 
norm-referenced test. 

Normative scores allow you to compare individuals and groups with national 
performance levels, and objective scores allow you to make comparisons relative to 
specific objectives. Together, these scores allow you to plan programs for your school 
and district and instruction for individual students. 

When used correctly, this information is invaluable for school administrators. However, 
several improvements can be made so that you can make even better programmatic 
and individual plans, such as 

o reducing testing time, 

o increasing the relevance of the test to the curriculum, and 

o having greater confidence in the national comparative information. 

These improvements are the goals of custom-made norm-referenced tests. 

Several models for constructing custom-made norm-reference tests have been 
attempted, with some degree of success. A discussion of three models follows. 

A MODEL USED IN TEXAS 

For the last few years, Texas has used a model state criterion-referenced test, which 
was statistically equated to a nationally normed norm-referenced test. Texas now 
administers the criterion-referenced test instead of the norm-referenced test and both 
norm-referenced and criterion-referenced scores are produced. 
The advantages of this approach are reduced testing time and greater relevance to the 
Texas curriculum than could be obtained from using the norm-referenced test alone. 

However, this approach has several disadvantages: 

o Equating these two different tests will result in inaccurate norm-referenced scores 
because of differences in test difficulty and content between the norm-referenced and 
criterion-referenced tests. Criterion-referenced scores are unaffected by the equating. 
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o Instruction focused on the curriculum will likely increase both the criterion-referenced 
scores and, as a result, the equated norm-referenced scores. Although score increases 
on the criterion-referenced portion of the test may accurately reflect student learning in 
these restricted domains, this is not the case for the much broader norm-referenced 
domains. 

This is because instruction has been effectively focused on only a portion of the traits 
measured by norm-referenced tests, thus producing higher equated norm-referenced 
scores than would be expected if the original norm-referenced test or a proper sample 
of items from that test were administered. 

When this distortion happens, the norm-referenced scores produced from this model 
are called norm-invalid. That is, the customized test does not accurately reproduce the 
normative scores that would have resulted had the entire norm-referenced test been 
administered. 

For a custom-made norm-referenced test to be fair, the scores must be norm-valid 
(Yen, Green, and Burket, 1987). Texas will leave this model in 1990 in favor of one that 
may be more successful in producing scores that approach norm-validity. 

A SECOND MODEL 



A second model of a custom-made test is one in which state- or district-developed 
criterion-referenced items are combined with a complete norm-referenced test. 
Norm-referenced scores are generated from the complete norm-referenced test, while 
objective information is derived from a combination of norm-referenced and locally 
developed items. 

This type of test reduces testing time because only one customized test is administered 
instead of both a norm-referenced and a criterion-referenced test. However, as with the 
Texas model that we discussed, norm invalidity may be a problem. 

If instruction is carefully targeted at the objectives and a subset of the norm-referenced 
test items is used for reporting achievement by objective, then norm-invalidity could 
result because instruction influences only a portion of the trait measured by the 
norm-referenced test. In this case, the norm-referenced scores could be inflated by the 
targeted instruction, thus rendering them invalid. 

A MODEL USED IN TENNESSEE 



Another model of a customized test was recently adopted by the State of Tennessee. 
The Tennessee model remedies the shortcomings of the first two models that we 
described. This model uses approximately 40 items instead of a full-length test of 80 to 
110 items for its norm-referenced module and a criterion-referenced module of 
state-developed items. 
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The norm-referenced module was specifically created so that it has proper statistical 
characteristics of reliability, adequate floors and ceilings, and articulation across test 
levels. Tennessee will use multiple test forms. 

Items used for the norm-referenced portion are not intended to be used for objective 
scores, and the criterion-referenced items are not used as part of the norm-referenced 
scores. 

Effective instruction targeted toward the state objectives will demonstrate student 
attainment of the state's objectives, and the norm-referenced portion will provide 
norm-valid scores. Thus, the Tennessee model reduces testing time and requires only 
one testing period rather than two. The objective scores will be useful for instructional 
planning and the norm-referenced scores can be used with confidence for national 
comparisons. 

A NOTE ABOUT NORM- VALIDITY 

As a school administrator, you should be concerned about the norm-validity of your 
district's test scores. During times of increased school, district, state, and national 
achievement (as we see now), critics may be quick to question the validity of your test 
results. Critics may point out that teachers are too familiar with the test items, that they 
teach actual test items, or that the scores may not reflect true changes in achievement. 
Williams (1988) and Koretz (1988a, 1988b) have both presented a distinction between 
changes in test scores and changes in achievement. 

Changes in test scores may result from a variety of instructional and administrative 
interventions, but changes in test scores may not reflect actual changes in achievement. 
Special coaching, inappropriate test preparation materials and methods, and narrowly 
targeted instruction may all increase test scores, but they do not necessarily lead to 
sustained and abiding increases in achievement. 

Just as instruction must support test score changes that are not spurious, i.e. produce 
true growth, test instruments must be designed and implemented so that if score 
increases occur, they represent a true change in achievement and are not the result of 
an inadequately designed customized testing program. 

Unless a customized norm-referenced test produces norm-valid scores, you cannot 
provide test results that reflect true changes in achievement. Even with an optimally 
designed customized test, abuses can still result. But without a properly designed 
customized norm-referenced test, you cannot demonstrate that achievement, rather 
than just test scores, has improved. 

Administrators at all levels must be able to tell the difference between norm-valid tests 
that allow actual achievement to be demonstrated and norm-invalid ones. When 
norm-valid test are used, you can report the test results with confidence. 
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If you have confidence in the test's quality, then test scores will accurately reflect 
meaningful changes in student achievement. Thus, you will be able to determine the 
effectiveness of your instructional program. 

If you have a norm-valid test, you can show your constituents that changes in the test 
scores are real. When these changes represent increases, your community and staff 
can be satisfied the instructional program works in the areas the test measures. If the 
score changes represent a decrease, then the test results can help you identify areas 
that need additional instructional effort. In either case, the students win because 
instructional support is forthcoming. 

Customized norm-referenced tests offer a viable alternative to both norm-referenced 
and criterion-referenced tests. One test, instead of two, is all that needs to be 
administered. Disruption in the schools is reduced, testing time is reduced, and 
instructional time is maximized. Alternate forms of customized norm-referenced tests 
can be used, minimizing criticisms of test familiarity and inappropriate test preparation 
activities. Teachers will be more likely to teach the complete curriculum, and increased 
achievement, rather than just increased scores, can result. 
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